Identification of conflicting selective effects on highly expressed genes.

Many different selective effects on DNA and proteins influence the frequency of codons and amino acids in coding sequences. Selection is often stronger on highly expressed genes. Hence, by comparing high- and low-expression genes it is possible to distinguish the factors that are selected by evolution. It has been proposed that highly expressed genes should (i) preferentially use codons matching abundant tRNAs (translational efficiency), (ii) preferentially use amino acids with low cost of synthesis, (iii) be under stronger selection to maintain the required amino acid content, and (iv) be selected for translational robustness. These effects act simultaneously and can be contradictory. We develop a model that combines these factors, and use Akaike's Information Criterion for model selection. We consider pairs of paralogues that arose by whole-genome duplication in Saccharmyces cerevisiae. A codon-based model is used that includes asymmetric effects due to selection on highly expressed genes. The largest effect is translational efficiency, which is found to strongly influence synonymous, but not non-synonymous rates. Minimization of the cost of amino acid synthesis is implicated. However, when a more general measure of selection for amino acid usage is used, the cost minimization effect becomes redundant. Small effects that we attribute to selection for translational robustness can be identified as an improvement in the model fit on top of the effects of translational efficiency and amino acid usage.


Introduction
The genome of S. cerevisiae retains many pairs of paralogous genes that arose by whole-genome duplication (Kellis et al. 2004). These pairs of paralogues provide a large set of comparable genes that have been evolving and diverging subject to the same selective forces for the same amount of time. They are thus an ideal data set on which to test some of the fundamental theories about how selection acts on gene sequences. We begin by considering several well documented effects. Firstly, selection for efficiency of translation is an important factor causing biased codon usage in S. cerevisiae (Percudani et al. 1997;Akashi, 2003), Drosophila (Powell and Moriyama, 1997), humans (Comeron, 2006) and other species (Chen et al. 2004;Rocha, 2004). Preferred codons are those for which the tRNAs are most abundant, and for which the gene copy number is largest. Codon bias is observed to increase as a function of expression level, suggesting that translational effi ciency is most important in highly expressed genes. Although translational selection is usually considered as selection for increasing the speed of translation, selection for translational accuracy may also play a role (Akashi, 1994).
Secondly, amino acid composition also varies with expression level. It has been argued that amino acids with high total tRNA abundance are preferred in high-expression genes (Lobry and Gautier, 1994;Akashi, 2003), i.e. translational effi ciency infl uences amino acid composition as well as codon usage. It has also been found that selection can act to minimize the energetic costs of amino acid synthesis. As a result, highly expressed genes have higher frequencies of the least costly amino acids (Akashi and Gojobori, 2002;Jansen and Gerstein, 2000).
Thirdly, in addition to the selective effects above, amino acid composition of proteins is also infl uenced by mutation pressure arising from the unequal base frequencies in the DNA (Knight et al. 2001;Bharanidharan et al. 2004;Urbina et al. 2006). If the GC content is high or low, this favours the amino acids whose codons have high or low GC content. We show below that in high-expression genes, selection on amino acid usage causes amino acid frequencies to differ more from the expectations based on GC content than in low-expression genes. A similar effect was seen in pseudogenes (Echols et al. 2002).
Fourthly, Drummond et al.(2005) showed that high-expression genes in S. cerevisiae evolve more slowly than low-expression genes, and proposed a new hypothesis of translational robustness to explain this. Translational error leads to amino acid replacements in proteins that may cause misfolding. The translational robustness theory proposes that proteins are selected to be tolerant to the effects of translational error, i.e. the amino acid sequence is selected to fold properly despite translational errors. A direct test of this hypothesis would involve an experimental study of how proteins fold when they contain amino acid substitutions. However, if the hypothesis is true, it should have the following observable consequences on the frequencies of codons and amino acids that are subject to test by sequence analysis. High-expression genes should preferentially use codons with higher numbers of synonymous neighbour codons because synonymous errors will have no effect. Highexpression genes should also preferentially use codons where errors lead to replacements by amino acids with similar physical properties, because these errors should have less disruptive effects on protein structure.
The layout of the genetic code is itself optimized to reduce the effects of translational error (Haig and Hurst, 1991;Freeland and Hurst, 1998). Errors are more frequent at 1st and 3rd positions than 2nd (Parker, 1989;Freeland and Hurst, 1998), and it is found that the genetic code is arranged such that 1st and 3rd position errors cause smaller changes in physical properties than 2nd position changes. Studies on mitochondrial gene sequences clearly illustrate this positional effect (Urbina et al. 2006) and show that the variability in frequencies of bases and amino acids among species is strongly dependent on the genetic code structure. Archetti (2004Archetti ( , 2006 has found evidence that selection for error minimization at the protein level can infl uence codon usage in Drosophila and rodents. However, Marquez et al.(2005) did not fi nd evidence for this effect.
There are several, potentially confl icting, selective effects on gene sequences. Thus, a statistical method is required that will allow many factors to be considered simultaneously and their relative magnitudes to be assessed. We make use of Akaike's Information Criterion (AIC), derived from Maximum Likelihood theory, which provides a principled method of model selection (Burnham and Anderson, 1998).

Materials and Methods
A total of 290 pairs of paralogues with expression level published in Drummond et al. (2005) were examined. The DNA sequence of each gene was extracted from the complete S. cerevisiae genome (Goffeau et al. 1996). The DNA sequence of each gene was translated to a protein sequence and the protein pairs were aligned using ClustalW (Thompson et al. 1994). Nucleotide sequence alignments were created from the protein alignments by replacing each amino acid with its corresponding codon. The total number of codons in this data set is n tot = 146398. Codon-based models will be used with 32 states that correspond to codon blocks translated by distinct groups of tRNAs. Each single-codon amino acid and each two-codon amino acid is a single block. Ile is two blocks: AUY and AUA. Four codon amino acids are treated as two blocks of pryimidine and purine-ending codons (e.g. Val = GUY and GUR). The 6-codon amino acids were treated as 3 blocks: Leu = UUR, CUY and CUR; Ser = UCY, UCR and AGY; Arg = CGH, CGG and AGR (where H denotes U, C or A). These blocks were based on the set of tRNA genes in the S. cerevisiae genome (Percudani et al. 1997) and the wobble pairing rules. The grouping of pyrimidine-ending pairs together is clear, because in every case they are translated by a single type of tRNA. Some purine-ending pairs have a single type of tRNA with U at the wobble position, but others have two tRNA types with U and C at the wobble position. It is expected that tRNAs with C at the wobble position translate only G-ending codons, but those with U translate both A-and Gending codons. For this reason, A-and G-ending codons are not independent, so we grouped them into a single block. The Arg(CGN) family is an exception, because the two tRNA types have anticodon ICG (translating CGU, CGC, and CGA codons) and CCG (translating only CGG). Hence, these codons were split into a block of 3 and a single-codon block.

Causes of asymmetry
Let n ij be the number of sites at which state i in the low-expression gene is aligned with state j in the high-expression gene (states i and j are one of the 32 codon blocks defi ned above). Let Δn ij = n ij -n ji . If the mutational process is symmetrical with respect to interchange of high and low expression genes, we would expect Δn ij = 0 (plus or minus statistical variation) for every i and j. Signifi cant deviation of Δn ij from 0 indicates an asymmetry in the substitution process between high and low expression genes. We now propose a series of factors that might be expected to cause such an asymmetry. For each proposed asymmetry factor, we defi ne a function δ(i, j) that is proportional to the predicted strength of the effect for substitutions between states i and j.
The fi rst predicted effect is that selection for translational effi ciency causes a preference for codons with a higher number of matching tRNA genes and that this preference will be stronger in high expression genes. Let N tRNA (i) be the number of tRNA gene copies for codon block i ( Table 1). The increase in copy number when mutating from i to j is δ tRNA (i, j) = N tRNA ( j) -N tRNA (i). According to this prediction, Δn ij should be positive if the number of tRNAs for codon block j is larger than for i.
The next prediction is that selection will cause a preference for amino acids of low synthetic cost and that this preference will be stronger in high expression genes. We use two different estimators of the cost of amino acid synthesis (Table 2). N ATP (a) is the number of ATP molecules required for biochemical synthesis of amino acid a (Akashi and Gojobori, 2002) and MW(a) is the molecular weight of a, which has been argued to be a more general cost measure than the ATP cost (Seligmann, 2003). According to these two measures, the savings in cost due to a mutation from i to j are δ ATP (i, j) = N ATP (a(i)) -N ATP (a( j)) and δ MW (i, j) = MW(a(i)) -MW(a(j)), where a(i) and a( j) are the amino acids coded by blocks i and j. Note that the order of the indices i and j in these two defi nitions is reversed in comparison to δ tRNA (i, j) because the prediction is that there will be a decrease in cost but and increase in N tRNA . In each case, the sign of δ(i, j) is chosen so that a positive value is a predictor of a positive Δn ij . In absence of selection, the frequency of the bases should be controlled by the equilibrium frequencies of the mutation process. If the equilibrium GC frequency is φ, the frequencies of the bases should be φ G = φ C = φ/2 and φ A = φ U = (1 -φ)/2. If a gene is under no selection other than the fact that stop codons are not used within the gene, the frequency of codon XYZ should be φ We obtain predicted amino acid frequencies by summing these codon frequencies for each amino acid. We estimate that φ = 0.3464 from the GC content at fourfold-degenerate sites. Table 2 shows the predicted and observed frequencies, f pred (a) and f ave (a). We use 'ave' to denote the average of observed frequencies in high-and low-expression genes. The difference Δf (a) = f ave (a) -f pred (a) is a measure of selection on amino acid usage (AAU). This could be positive or negative, according to whether the amino acid is preferred or avoided. The next predicted asymmetry effect is that highexpression genes should be under stronger AAU selection than low-expression genes, i.e. if Δf (a) is positive, the frequency of a should be higher in high expression genes than low expression genes, and vice versa if Δf (a) is negative. We defi ne δ AAU (i, j) = Δf (a(j)) -Δf (a(i)) as a predictor of the asymmetry caused by AAU selection.
We will say that two codons are neighbours in the genetic code if they differ by only one of the three positions. In Table 1, N SN (i) is the number of synonymous neighbour codons of any one codon in block i. We now defi ne δ SN (i,j) = N SN (j) -N SN (i). According to the translational robustness hypothesis, sequences are selected so that the effects of errors due to codon-anticodon mispairing during translation are minimized. It should, therefore, be preferable to use amino acids with fourcodon blocks (N SN (i) = 3) rather than those with two-codon blocks (N SN (i) = 1) because mispairing at the third position has no effect in four-codon blocks. This is the principal effect measured by δ SN (i, j). In four-codon amino acids, both codon blocks have N SN (i) = 3, therefore δ SN (i, j) = 0 for synonymous substitutions. Thus δ SN (i, j) is principally a predictor of change in amino acid usage, not codon usage. However, for six-codon amino acids (Arg, Leu and Ser), there is also a codon usage effect because the numbers of synonymous neighbours for codons in the three codon blocks are not equal. A well known effect in molecular evolution is that amino acid substitutions tend to be conservative, i.e. they occur more frequently between amino acids with similar physical properties. Our study of mitochondrial proteins (Urbina et al. 2006) shows that those amino acids that can most easily be replaced by other amino acids with similar physical properties respond most easily to changes in mutation pressure and therefore have the most variable frequencies among species. In particular, the amino acids in the fi rst two columns of the genetic code (with U and C and the second codon position) are much more easily interchangeable than those in the third and fourth columns (with A and G at second position). This observation leads us to another predicted asymmetry effect. If translational robustness is important, codon blocks should be preferred if neighbouring codons code for amino acids with similar physical properties. We will defi ne amino acids as 'close' if their physical property distance is less than a threshold value. The defi nition of the distance measure and the list of pairs of close amino acids are given in detail later in the paper. Let N CN (i) be the number of close neighbours of a codon in block i-i.e. the number of codons differing by error at only one position that code for a close amino acid. Synonymous codons are included in this count because an identical amino acid is obviously 'close'. As translational errors are more frequent at 1st and 3rd position than 2nd, we include only codons that are neighbours by 1st or 3rd position errors in the count of N CN (i). This gives a number between 0 and 6. Hence, we defi ne δ CN (i, j) = N CN ( j) -N CN (i) as a predictor of asymmetry. The N CN (i) measure has some similarities with the scoring system used by Archetti (2004) in his study of the infl uence of protein-level error minimization on codon usage. However, Archetti's system models mutations, whereas our system models errors in translation, as we intend it to be a predictor of translational robustness. Archetti (2004) includes multiple mutations (up to 10) and specifi cally makes the point that most synonymous codons would have the same score if only a single mutation were allowed. As we are interested in translational error, we assume that the chance of codon-anticodon mispairing would be negligible if there were more than one mismatch. Therefore, we only allow for a single position error. Most synonymous codons have the same value of N CN (i), although there are some differences among codons for six-codon amino acids. Thus δ CN (i, j), is principally a predictor of asymmetry in amino acid usage, not codon usage, as was also the case with δ SN (i, j).

Simple tests for asymmetry
For each asymmetry effect introduced above, we calculate the number of ij pairs, N pairs , for which δ(i, j) > 0. The maximum number of such pairs is 32 × 31/2 = 496, but N pairs is always less than 496 because pairs where δ(i, j) = 0 are excluded. We then count the number of pairs, N correct , where the sign of Δn ij is correctly predicted (both δ(i, j) > 0 and Δn ij > 0). We calculate the probability p that at least N correct out of N pairs pairs would have the correct sign if each sign were random. From Table 3, there is signifi cant agreement with the direction of the asymmetry predicted by tRNA, ATP, MW and AAU effects, but no evidence for SN or CN effects. However, it is premature to draw conclusions, because these tests consider each effect alone, and a large effect in one direction might mask a smaller effect in the opposite direction. It is therefore desirable to develop a statistical model in which all these effects can be considered together. Before doing this, we consider two of the asymmetry effects graphically. Table 1 gives the frequency π for each codon block (averaged over high-and low-expression genes), and the relative frequency, R ave , of the codon block with respect to the total frequency of the corresponding amino acid, f ave . We also calculated relative frequencies R high and R low separately in high-and low-expression genes. The difference in these is given in Table 1. Figure 1a shows that R ave increases as a function of the relative number of tRNAs for the codon block, measured as a fraction of the total number of tRNAs for that amino acid (only cases with more than one codon block per amino acid are considered). This confi rms that the average codon usage is infl uenced by tRNA gene copy number. Figure  1b shows that R high -R low also increases with the relative tRNA number. Thus, codon bias is stronger in high-expression than low-expression genes, as expected. Figure 2a shows the relationship between f ave and f pred . Although there is a positive correlation (r = 0.75, p = 1.3 × 10 -4 ), there is considerable scatter. Table 2 shows the difference Δf (a) between these two frequencies, and also the difference between the observed frequencies in high-and low-expression genes, f high -f low . Figure 2b shows that there is a positive correlation between these two quantities (r = 0.67, p = 1.3 × 10 -3 ). Thus, the deviations between observed and predicted frequencies are caused by selection on AAU, and this is stronger in high-expression than low-expression genes.

Selection of a substitution rate model
Given a model of the data, we can calculate expected frequencies, f ij , of each pair of states in the alignment. The log-likelihood of the data, given the model, is By defi nition, AIC = 2(-lnL + K), where K is the number of parameters that are estimated from the data, and L denotes the maximum likelihood value of L. The model that best approximates the data is the one with smallest AIC (Burnham and Anderson, 1998). A more complex model with a larger number of parameters will have higher likelihood (-lnL will be smaller). However, an overly complex model with redundant parameters has a larger K but does not significantly decrease -lnL. AIC selects a model with suffi cient parameters, but not too many. The factor of 2 in the definition is conventional, although it makes no difference to the ranking of the models. Let r H and r L be matrices describing the substitution rates in the high-and low-expression genes. The matrices of substitution probabilities in the time t since gene duplication are P H (t) = exp(tr H ) and P L (t) = exp(tr L ). The pair frequencies f ij predicted by the model are where π k is the initial frequency of state k. We begin with symmetric models, where r H and r L are equal to a single time reversible rate matrix r. Later we add asymmetric effects. We distinguish 6 categories of substitutions. Categories 1, 2 and 3 are non-synonymous substitutions requiring 1, 2 and 3 base changes. Category 4 includes synonymous substitutions with a single base change (e.g. Leu(CUR)-Leu(CUY) and Leu(CUR)-Leu(UUR)). Category 5 includes synonymous substitutions requiring 2 base changes where both of these could be synonymous single changes (e.g. Leu(CUY)-Leu(UUR)). Category 6 includes synonymous substitutions requiring 2 or 3 base changes where the individual base changes are non-synonymous (Ser(AGY)-Ser(UCY) and Ser(AGY)-Ser(UCR)).
In the standard symmetric model, S1, six parameters, α 1 ...α 6 , control the substitution rates in each category. For synonymous substitutions, r ij = α cat(i, j) κ ij π j , where cat(i,j) = 4, 5 or 6. For non-synonymous rates, r ij = α cat(i, j) κ ij π j exp(-d(a(i), a( j))/D), where cat(i,j) = 1, 2 or 3. The diagonal elements are equal to minus the sum of the other elements on the row. F o r s i n g l e -s u b s t i t u t i o n c a t e g o r i e s (1 and 4), transitions occur faster than transversions by a factor κ. We defi ne κ ij = κ for transitions and κ ij = 1 for transversions. We did not account for transition-transversion rate differences in multiple substitution categories (2, 3, 5 and 6), i.e. κ ij = 1 for all pairs. Non-synonymous rates follow a decreasing function of the distance d between amino acids. D is a parameter that controls the shape of this decreasing function. A similar model was used by Goldman and Yang (1994), but only single substitutions were permitted. An appropriate distance matrix has already been calculated from 8 physical properties of amino acids (Higgs and Attwood, 2005, chapter 2) and has been used to predict evolutionary properties of mitochondrial gene sequences (Urbina et al. 2006). The 8 properties are: 1 = volume (Creighton, 1993); 2 = bulkiness (Zimmerman et al. 1968); 3 = polarity (Zimmerman et al. 1968); 4 = isoelectric point (Zimmerman et al.1968); 5 = hydrophobicity (Kyte and Doolittle, 1982); 6 = hydrophobicity (alternative scale) (Engelman et al. 1986); 7 = surface area accessible to water (Miller et al. 1987); 8 = fraction of accessible area lost when a protein folds (Rose et al. 1985). Let z ak be the value of the kth property of amino acid a, after transforming each property so that its mean is 0 and its standard deviation is 1. The distance d(a,b) is the euclidean distance between amino acids a and b in the 8-dimensional z-space (Urbina et al. 2006).
There are 32 frequencies π i that must add up to 1; hence 31 independent parameters. We set these to the observed average frequencies of the states (Table 1). These parameters are estimated from the data; therefore they contribute 31 towards K in the AIC equation. As the total amount of data is large (n tot = 146398), the observed frequencies will be very close to the ML frequencies. These 31 parameters are treated equivalently in all models and therefore they do not affect the ranking by AIC.
We expect α 4 to be the largest of the rate parameters. Therefore we set α 4 = 1, and measure the other 5 rate parameters relative to this. Finally, rates are normalized such that the mean substitution rate is equal to 1. This sets the time scale such that t is the mean number of substitutions per site.
There are 8 parameters in S1 for which ML values must be estimated from the data (5 α values,  t, κ, and D). Hence, the total number of parameters is K = 39 for S1. The 8 ML values were found by a random hill-climbing routine beginning from an initial rough estimate. At each iteration, one random parameter was changed by a small random amount, and the new parameter was accepted if ln L increased. 30000 iterations of this process were suffi cient to give good convergence for the S series of models (at least 3 signifi cant fi gures in parameter values and less than 0.01 in lnL). For the W and A series models, 80000 iterations were used. Parameters were found to converge to the same values from several different initial points. No problems of local optima were encountered.
Optimal parameters for model S1 are given in Table 4. The rate parameters rank in the order α 4 > α 5 > α 6 > α 1 > α 2 > α 3 , which we would expect if synonymous substitutions are faster than nonsynonymous ones, and if single base changes are faster than double and triple changes. Models S2-S8 are variants that test the assumptions of model S1. Model S2 removes the difference between transitions and transversions by setting κ = 1. This leads to a large increase in AIC. In Table 5, ΔAIC is the difference in AIC with respect to S1. When interpreting ΔAIC values, it should be remembered that the relative weight to be Table 4. ML parameters for the most important models.

S1
α 1 = 0.0691; α 2 = 0.0282; α 3 = 0.0238; α 4 = 1; α 5 = 0.396; α 6 = 0.121; t = 0.757; D = 3.407; κ = 1.707. Selective Effects on Highly Expressed Genes associated with a model is proportional to exp(-ΔAIC/2)-see Burnham and Anderson (1998), section 2.9. Most of the ΔAICs in Table 5 are very large, so that the relative weights of the less well fi tting models are very small compared with the better models. A rule of thumb is that models with ΔAIC < 2 have considerable support as alternatives, those with 2 < ΔAIC < 10 have weak support, and those with ΔAIC > 10 have essentially no support. Model S2 is thus rejected with respect to S1, which confi rms that transitions are faster than transversions. In models S3, S4, and S5 some of the α parameters for multiple substitutions are set to zero. All of these lead to large increases in AIC and are thus rejected. Thus, the data are not well described by a model that allows only single base changes at a time. Models S6, S7 and S8 consider variations in the way that the rates depend on amino acid distance. In S6, there is no distance function, i.e. r ij = α cat(i,j) κ ij π j for the non-synonymous changes as well as the synonymous ones. S7 uses a gaussian distance function r ij = α cat(i,j) κ ij π j exp(-(d(a(i), a( j))/D) 2 ) for the nonsynonymous rates, and S8 uses a power law function r ij = α cat(i,j) κ ij π j /d(a(i), a( j)) β , where β is a parameter to be estimated. According to AIC, S7 and S8 are both much worse than S1 but much better than S6. This suggests that rates do decrease as a function of amino acid distance and that the exponential function models this effect better than the gaussian or the power law.
Since the amino acid distance is an important factor in the model, it is worth asking if the distance measure itself can be improved. The 8 properties used may not be equally important. We therefore assigned variable weights w k to the properties. We also added a 9th property that was not included by Higgs and Attwood (2005). This is the polar requirement scale (Woese et al.1966), which has been shown to be important in studies on the genetic code (Haig and Hurst, 1991;Freeland and Hurst, 1998), and which is likely to be a property that influences substitution rates. We define a weighted distance measure with the constraint that the 9 weights sum to 1, so that there are 8 independent weight parameters. Model W1 is equivalent to S1 except that the w k are additional parameters. These are estimated by the same numerical optimization technique, starting with an initial state where all properties are equally weighted. Model W1 is strongly preferred over model S1. The difference in AIC between these two models is 4330.6. The optimal weights for model W1 are given in Table 4. Properties 1, 3, and 6 have weights that tend to 0 during optimization. In hindsight, we could have eliminated these three properties from the model, in which case the AIC would be reduced by 6. However, we had no a priori reason why these particular three should be eliminated, and to remove them at this point would amount to 'data dredging', in the sense of (Burnham and Anderson, 1998). The W and A models considered below are variants on W1. In each of these models, all 9 weights were included, but the same three weights converged to 0 in the optimization. For each of these models, the weights were counted as 8 parameters for the AIC. The three redundant parameters make no difference to the ranking of these models. Another factor that often improves the fi t of data used for molecular phylogenetics is to allow variation in rates across sites. We added this using a number of discrete Γ-distributed rate categories (Yang, 1994). This involves addition of one extra parameter, λ, that controls the shape of the Γ distribution. Models W2, W3 and W4 are equivalent to W1 with the addition of 2, 3, and 4 rate categories. All these are improvements over W1. W3 is the best of the W series and is the reference for ΔAIC in Table 5. As W3 is the best symmetric model, the distance matrix derived from the ML weights in W3 is the most meaningful scale of distance between amino acids. Distances between typical amino acids are close to 1 on this scale. Amino acids with d < 0. Model W5 is equivalent to W3 with κ set to 1. Models W6, W7, and W8 are variants in which some of the rates of multiple substitutions are set to 0. All of these models perform much worse than W3, according to AIC. Thus, we retain W3 as the best symmetric model, and we consider the effects of asymmetry in substitution rate by introducing perturbations to this model.

Is there asymmetry in the substitution rate?
Let r ij be the rate matrix for W1. From this we can defi ne distinct rate matrices for the high-and lowexpression lineages as follows. For synonymous substitutions, and for non-synonymous substitutions, where the δ's are the asymmetry factors defi ned above, and the ε's are model parameters that quantify the strength of the effect. The principle is that positive values of the δ's should correlate with increased rates of substitution on the branch to the high-expression genes. On the branch to the lowexpression genes, these effects are reversed; hence r ij L is defi ned in the same way except that there is a negative sign in front of all the εδ terms. The values of the ε parameters must be ≥0. Negative values are not permitted by the optimization routine. Note that translational effi ciency selection could infl uence both synonymous and non-synonymous substitutions, but its effect is likely to be strongest on synonymous substitutions. Therefore we introduced parameters ε tRNA-NS in the nonsynonymous rates and ε tRNA in the synonymous rates that are optimized separately. The full asymmetric model, A8, includes all 7 ε parameters. The other A-series models include subsets of the ε's ( Table 6). All these models include 3 Γ distributed rate categories, i.e. they reduce to W3 if all the ε's are zero.
Models A1-A7 consider the asymmetry effects separately. A1 includes only the tRNA effect in the synonymous rates. This leads to a large reduction in AIC with respect to W3 (see ΔAIC* column in Table 6). A2 includes tRNA effects in both synonymous and non-synonymous rates. During parameter fi tting, ε tRNA-NS converged to 0; hence the solution is the same as model A1. There is thus no evidence for a tRNA effect on nonsynonymous substitutions, even though the effect on synonymous substitutions is large. Models A3, A4 and A5 consider ATP, MW and AUU effects individually. Each of these gives a noticeable improvement in AIC with respect to W3. Models A6 and A7 consider the two measures of translational robustness. In A6, ε SN converges to 0, and the likelihood is the same as W3, whereas in A7, there is some improvement in AIC due to the addition of ε CN . When all 7 asymmetry factors are added simultaneously (A8), the optimum solution has non-zero values for ε tRNA , ε AAU , ε SN and ε CN , but the other 3 ε's converge to zero. Model A9 includes only the four non-redundant asymmetry effects. It therefore has the same likelihood as A8 and an improved AIC because it has 3 fewer parameters. A9 is not quite the best model, because if ε CN is also eliminated, leaving only ε tRNA , ε AAU and ε SN (A10), this leads to a slight reduction in AIC. If ε CN is included instead of ε SN (A11), the result is substantially worse than either A9 or A10, and if neither ε SN or ε CN is included (A12), the result is worse again. In summary, if tRNA and AAU effects are already accounted for, then addition of either ε SN or ε CN improves the fi t. Translational robustness seems to be better modelled by ε SN than ε CN . If ε SN is already included, then addition of ε CN has only a marginal effect. This is different from the conclusion when considering these effects singly (A6 and A7). Thus, the model selected by AIC (A10) includes ε tRNA , ε AAU and ε SN . Models A12, A13 and A14 establish that if any one of these three parameters is eliminated, the quality of fi t to data is significantly worse. It is surprising that the ATP and MW effects should drop out, given that these are important when considered alone (A3, A4). There appears to be some redundancy of these factors with AAU. In model A15, we included ATP and MW effects alongside tRNA and SN effects, but excluded AAU. In this case, both ε ATP and ε MW were non-zero in the optimal solution. This gives an improvement with respect to A13 (i.e. there is some benefi t from inclusion of ATP and MW), but it is substantially worse than A10.
Inclusion of any one of the asymmetry effects is suffi cient to mean that f ij ≠ f ji for every pair. For each A model, there are 496 pairs for which f ij > f ji . Table 6 shows N correct , the number of these pairs for which Δn ij > 0, and the p values from the sign test. These may be compared with Table 3. Models with better AIC scores tend also to have larger N correct .

Discussion and Conclusions
As simultaneous multiple mutations within one codon are likely to be rare, we might expect that a mutation matrix that permits only single base changes would be suffi cient. In fact, this was not the case. This was shown initially with models S3-S5, which assume all sites evolve at constant rates. We might worry that apparent double substitutions are an artefact resulting from two separate single substitutions happening at a fast evolving site in the time that a slowly evolving site changes only once. Nevertheless, the same result is seen with models W6-W8, which account for variation in rates across sites. We have pointed out the same effect in the paired regions of RNA secondary structure (Higgs, 2000;Savill et al. 2001). This can be explained by compensatory mutation theory (Higgs, 1998), because each single mutation is likely to be deleterious, but the two together may be close to neutral. It is possible that the same thing is occurring in protein sequences, e.g. between the two families of Ser codons (parameter α 6 ). It is also possible that there is some specific mutational process that operates on groups of successive bases. In amino acid Table 6. Model selection criteria for the asymmetric models. ΔAIC is measured relative to A10. ΔAIC* is measured relative to W3.